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WHAT IS CLAIMED IS: 



1 1 . A method of detecting similarity between protein sequences comprising 

2 comparing a first disulfide signature to a second disulfide signature, each disulfide signature 

3 being characteristic of a corresponding protein sequence. 

1 2. The method of claim 1, wherein each disulfide signature describes a disulfide 

2 topology of the corresponding protein sequence. 

1 3. The method of claim 1, wherein each disulfide signature includes the number 

2 of residues between a pair of cysteines joined by a disulfide bridge, and the number of 

3 residues between the first cysteine of each disulfide bridge and the first cysteine of the next 

4 disulfide bridge in the corresponding protein sequence. 

1 4. The method of claim 3, wherein each disulfide signature includes the number 

2 of residues between each pair of cysteines joined by a disulfide bridge, and the ntmiber of 

3 residues between the first cysteine of each disulfide bridge and the first cysteine of the next 

4 disulfide bridge in the corresponding protein sequence, for each disulfide bridge in the 

5 corresponding protein sequence. 

1 5. The method of claim 1, wherein comparing includes calculating a measure of 

2 similarity between the first disulfide signature and the second disulfide signature. 

1 6. The method of claim 5, wherein comparing fiirther includes calculating a 

2 measure of statistical relevance for the measure of similarity between the first disulfide 

3 signature and the second disulfide signature. 

1 7. The method of claim 1, wherein comparing includes searching a database 

2 including a plurality of disulfide signatures, each disulfide signature of the database 

3 characteristic of a corresponding protein sequence. 

1 8. The method of claim 7, wherein comparing includes calculating a measure of 

2 similarity between the first disulfide signature and each of a plurality of disulfide signatures 

3 of the database. 
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9. The method of claim 7, wherein searching the database includes searching 
with a subpattem of the first disulfide signature. 

10. * The method of claim 9, wherein the subpattem is generated by calculating the 
disulfide signature that results when one or more disulfide bridges is removed firom the 
protein sequence corresponding to the first disulfide signature. 

1 1 . The method of claim 7, wherem at least one disulfide signature in the database 
is associated with a sequence identifier. 

12. The method of claim 7, wherein at least one disulfide signature in the database 
is associated with a domain identifier. 

13. The method of claim 7, further comprising clustering disulfide signatures of 
the database. 

14. The method of claim 13, wherein clustering includes grouping disulfide 
signatures by number of disulfide bridges. 

15. The method of claim 13, wherein clustering includes grouping disulfide 
signatures by disulfide topology. 

16. The method of claim 13, wherein clustering includes calculating a measure of 
similarity between disulfide signatures and grouping based on the measure of similarity. 

17. A method of detecting similarity between protein sequences comprising: 
generating a database including a plurality of disulfide signatures, each disulfide 

signature being characteristic of a corresponding protein sequence; and 

comparing a first disulfide signature corresponding to a protein sequence to at least 
one disulfide signature of the database. 

18. The method of claim 17, wherein each disulfide signature describes a disulfide 
topology of the corresponding protein sequence. 

19. The method of claim 18, wherein each disulfide signature includes the number 

of residues between a pair of cysteines joined by a disulfide bridge, and the number of 
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3 residues between the first cysteine of each disulfide bridge and the first cysteine of the next 

4 disulfide bridge in the corresponding protein sequence. 

1 20. The method of claim 19, wherein each disulfide signature includes the number 

2 of residues between each pair of cysteines joined by a disulfide bridge, and the number of 

3 residues between the first cysteine of each disulfide bridge and the first cysteine of the next 

4 disulfide bridge in the corresponding protein sequence, for each disulfide bridge in the 

5 corresponding protein sequence. 

1 21. The method of claim 17, wherein generating the database includes identifying 

2 a disulfide bridge by protein sequence homology or protein structure homology. 

1 22. The method of claim 17, wherein generating the database includes calculating 

2 a disulfide signature for a protein sequence. 

1 23. The method of claim 17, wherein comparing includes calculating a measure of 

2 similarity between the first disulfide signature and the disulfide signature of the database. 

1 24. The method of claim 23, wherein comparing fiirther includes calculating a 

2 measure of statistical relevance for the measure of similarity between the first disulfide 

3 signature and the disulfide signature of the database. 

1 25. The method of claim 17, wherein comparing includes comparing a subpattem 

2 of the first disulfide signature to at least one disulfide signature of the database. 

1 26. The method of claim 25, wherein the subpattem is generated by calculating 

2 the disulfide signature that results when one or more disulfide bridges is removed firom the 

3 corresponding protein sequence. 

1 27. The method of claim 17, wherein at least one disulfide signature of the 

2 database is associated with a sequence identifier. 

1 28. The method of claim 17, wherein at least one disulfide signature of the 

2 database is associated with a domain identifier. 
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29. The method of claiin 1 8, further comprising clustering the disulfide signatures 
of the database. 

30. The method of claim 29, wherein clustering includes grouping disulfide 
signatures by number of disulfide bridges. 

3 1 . The method of claim 29, wherein clustering includes grouping disulfide 
signatures by disulfide topology. 

32. The method of claim 29, wherein clustering includes calculating a measure of 
similarity between at least one pair of disulfide signatures and grouping based on the measure 
of similarity. 

33 . A method of detecting similarity between protein sequences comprising 
generating a database including a plurality of disulfide signatures, each disulfide signature 
being characteristic of a corresponding protein sequence. 

34. The method of claim 33, wherein each disulfide signature describes a disulfide 
topology of the corresponding protein sequence. 

35. The method of claim 34, wherein each disulfide signature includes the number 
of residues between a pair of cysteines joined by a disulfide bridge, and the number of 
residues between the first cysteine of each disulfide bridge and the first cysteine of the next 
disulfide bridge in the corresponding protein sequence. 

36. The method of claim 35, wherein each disulfide signature includes the number 
of residues between each pair of cysteines joined by a disulfide bridge, and the number of 
residues between the first cysteine of each disulfide bridge and the first cysteine of the next 
disulfide bridge in the corresponding protein sequence, for each disulfide bridge in the 
corresponding protein sequence. 

37. The method of claim 33, wherein generating the database includes identifying 
a disulfide bridge by protein sequence homology or protein stmcture homology. 
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1 38. The method of claim 33, wherein generating the database includes calculating 

2 a disulfide signature for a protein sequence. 

1 39. The method of claim 38, wherein calculating the disulfide signature includes 

2 determining the number of residues between a pair of cysteines joined by a disulfide bridge 

3 in the protein sequence. 

1 40. The method of claim 38, wherein calculating the disulfide signature includes 

2 determining the number of residues between the first cysteine of each disulfide bridge and 

3 the first cysteine of the next disulfide bridge in the protein sequence. 

1 41 . A computer program for detecting similarity between protein sequences, the 

2 computer program comprising instmctions for causing a computer system to compare a first 

3 disulfide signature to a second disulfide signature, each disulfide signature being 

4 characteristic of a corresponding protein sequence. 

1 42. The computer program of claim 41, wherein each disulfide signature includes 

2 the number of residues between a pair of cysteines joined by a disulfide bridge, and the 

3 number of residues between the first cysteine of each disulfide bridge and the first cysteine of 

4 the next disulfide bridge in the corresponding protein sequence. 

1 43. The computer program of claim 42, wherein each disulfide signature includes 

2 the number of residues between each pair of cysteines joined by a disulfide bridge, and the 

3 nimiber of residues between the first cysteiQe of each disulfide bridge and the first cysteine of 

4 the next disulfide bridge in the corresponding protein sequence, for each disulfide bridge in 

5 the corresponding protein sequence. 

1 44. The computer program of claim 41, wherein comparing includes calculating a 

2 measure of similarity between the first disulfide signature and the second disulfide signature. 

1 45. The computer program of claim 44, wherein comparing fiulher includes 

2 calculating a measure of statistical relevance for the measure of similarity between the first 

3 disulfide signature and the second disulfide signature. 
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46. The computer program of claim 41, wherein comparing includes searching a 
database including a pluraUty of disulfide signatures, each disulfide signature of the database 
characteristic of a corresponding protein sequence. 

47. The computer program of claim 46, wherein searching the database includes 
searching with a subpattem of the first disulfide signature. 

48. The computer program of claim 47, wherein the subpattem is generated by 
calculating the disulfide signature that results when one or more disulfide bridges is removed 
from the protein sequence corresponding to the first disulfide signature. 

49. The computer program of claim 46, wherein at least one disulfide signature in 
the database is associated with a sequence identifier. 

50. The computer program of claim 46, wherein at least one disulfide signature in 
the database is associated with a domain identifier. 

5 1 . The computer program of claim 46, fiirther comprising clustering disulfide 
signatures of the database. 

52. The computer program of claim 51, wherein clustering includes grouping 
disulfide signatures by number of disulfide bridges. 

53. The computer program of claim 51, wherein clustering includes grouping 
disulfide signatures by disulfide topology. 

54. The computer program of claim 5 1 , wherein clustering includes calculating a 
measure of similarity between disulfide signatures and grouping based on the measure of 
similarity. 

55 . A computer-readable data storage medium comprismg a data storage material 
encoded with a computer-readable database, the database comprising a plurality of disulfide 
signatures, each disulfide signature of the database characteristic of a corresponding protein 
sequence. 
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1 56. The data storage medium of claim 55, wherein each disulfide signature of the 

2 database describes a disulfide topology of the corresponding protein sequence. 

1 57. The data storage medium of claim 55, wherein each disulfide signature 

2 includes the number of residues between a pair of cysteines joined by a disulfide bridge, and 

3 the nimiber of residues between the first cysteine of each disulfide bridge and the first 

4 cysteine of the next disulfide bridge in the corresponding protein sequence. 

1 58. The data storage meditun of claim 57, wherein each disulfide signature 

2 includes the number of residues between each pair of cysteines joined by a disulfide bridge, 

3 and the number of residues between the first cysteine of each disulfide bridge and the first 

4 cysteine of the next disulfide bridge in the corresponding protein sequence, for each disulfide 

5 bridge in the corresponding protein sequence. 

1 59. The data storage medium of claim 55, wherein at least one disulfide signature 

2 in the database is associated with a sequence identifier. 

1 60. The data storage medium of claim 55, wherein at least one disulfide signature 

2 in the database is associated with a domain identifier. 

1 61 . The data storage medium of claim 55, wherein at least one disulfide signature 

2 in the database is associated with a cluster identifier. 

1 62. The data storage medium of claim 55, wherein the data storage material is 

2 fiirther encoded with a computer program comprising instmctions for causing a computer 

3 system to compare a first disulfide signature to a second disulfide signature, each disulfide 

4 signature being characteristic of a corresponding protein sequence. 

1 63. The data storage medium of claim 62, wherein comparing includes calculating 

2 a measure of similarity between the first disulfide signature and the second disulfide 

3 signature. 

1 64. The data storage medium of claim 63, wherein comparing fiirther includes 

2 calculating a measure of statistical relevance for the measure of similarity between the first 

3 disulfide signature and the second disulfide signature. 
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65. The data storage medium of claim 62, wherein comparing includes searching 
the database. 

66. The data storage medium of claim 65, searching the database includes 
searching with a subpattem of the first disulfide signature. 

67. The data storage medium of claim 66, wherein the subpattem is generated by 
calculating the disulfide signature that results when one or more disulfide bridges is removed 
fi-om the protein sequence corresponding to the first disulfide signature. 

68. A method of describing a protein sequence comprising generating a first 
disulfide signature, the disulfide signature describing the cysteine spacing and disulfide 
topology of first a protein sequence. 

69. The method of claim 68, fiirther comprising identifying a disulfide bridge by 
protein sequence homology or protein stmcture homology. 

70. The method of claim 68, fiirther comprising generating a second disulfide 
signature, the signature describing the cysteine spacing and disulfide topology of a second 
protein sequence. 

71 . The method of claim 70,' fiirther comprising comparing the first disulfide 
signature to a second disulfide signature. 

72. The method of claim 71, wherein comparing includes calculating a measure of 
similarity between the first disulfide signature and the second disulfide signature. 

73. The method of claim 71, fiirther comprising generating a database including 
the first and second disulfide signatures. 
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